Epidemiology
Automatically Learning Hybrid Digital Twins of Dynamical Systems
Digital Twins (DTs) are computational models that simulate the states and temporal dynamics of real-world systems, playing a crucial role in prediction, understanding, and decision-making across diverse domains. However, existing approaches to DTs often struggle to generalize to unseen conditions in data-scarce settings, a crucial requirement for such models. To address these limitations, our work begins by establishing the essential desiderata for effective DTs. Hybrid Digital Twins (HDTwins) represent a promising approach to address these requirements, modeling systems using a composition of both mechanistic and neural components. This hybrid architecture simultaneously leverages (partial) domain knowledge and neural network expressiveness to enhance generalization, with its modular design facilitating improved evolvability. While existing hybrid models rely on expertspecified architectures with only parameters optimized on data, automatically specifying and optimizing HDTwins remains intractable due to the complex search space and the need for flexible integration of domain priors. To overcome this complexity, we propose an evolutionary algorithm (HDTwinGen) that employs Large Language Models (LLMs) to autonomously propose, evaluate, and optimize HDTwins.
Rethinking Variational Inference for Probabilistic Programs with Stochastic Support Tim Reichelt 1 Luke Ong 1,2 Tom Rainforth
We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support. Existing approaches to this problem rely on designing a single global variational guide on a variable-by-variable basis, while maintaining the stochastic control flow of the original program. SDVI instead breaks the program down into sub-programs with static support, before automatically building separate sub-guides for each. This decomposition significantly aids in the construction of suitable variational families, enabling, in turn, substantial improvements in inference performance.
6ed5bf446f59e2c6646d23058c86424b-Paper-Conference.pdf
Large language models (LLMs) are being applied to time series forecasting. But are language models actually useful for time series? In a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade forecasting performance--in most cases, the results even improve! We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and find that patching and attention structures perform similarly to LLM-based forecasters.
Ahmed M. Alaa
Appendix for "When and How to Lift the Lockdown? In this Section, we provide a detailed comparison between different existing approaches for modeling fatality curves. A tabulated comparison between our model and existing ones is laid out in Table A1. Table A1: Comparison between existing approaches to model fatality curves. Because of the relative infrequency of pandemics, little related work has been done within the machine learning community to address this problem. In what follows, we provide a brief overview of previous works. Previous works prior to the current pandemic have been primarily focused on learning contagion (diffusion) processes on networks, e.g., [15, 16]; unfortunately, these models do In response to the COVID-19 pandemic, two strands of research work have emerged: (a) methods for devising optimal control (lockdown) policies to contain disease spread [17, 18, 19], and (b) models for forecasting disease spread and expected fatalities [5, 6, 7, 9, 10, 20, 21].
Large Pre-trained time series models for cross-domain Time series analysis tasks
Large pre-trained models have been vital in recent advancements in domains like language and vision, making model training for individual downstream tasks more efficient and provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a foundational time-series model from multidomain time-series datasets: extracting semantically useful tokenized inputs to the model across heterogenous time-series from different domains. We propose Large Pre-trained Time-series Models (LPTM) that introduces a novel method of adaptive segmentation that automatically identifies optimal dataset-specific segmentation strategy during pre-training. This enables LPTM to perform similar to or better than domain-specific state-of-art model when fine-tuned to different downstream time-series analysis tasks and under zero-shot settings. LPTM achieves superior forecasting and time-series classification results taking up to 40% less data and 50% less training time compared to state-of-art baselines.
Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments
We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming that the causal model is a one-directional MR model. As such, in this paper, we first theoretically investigate the identification of the bi-directional MR from observational data. In particular, we provide necessary and sufficient conditions under which valid IV sets are correctly identified such that the bi-directional MR model is identifiable, including the causal directions of a pair of phenotypes (i.e., the treatment and outcome). Moreover, based on the identification theory, we develop a cluster fusion-like method to discover valid IV sets and estimate the causal effects of interest. We theoretically demonstrate the correctness of the proposed algorithm. Experimental results show the effectiveness of our method for estimating causal effects in both one-directional and bi-directional MR models.
Bayesian Optimization of Functions over Node Subsets in Graphs
We address the problem of optimizing over functions defined on node subsets in a graph. The optimization of such functions is often a non-trivial task given their combinatorial, black-box and expensive-to-evaluate nature. Although various algorithms have been introduced in the literature, most are either task-specific or computationally inefficient and only utilize information about the graph structure without considering the characteristics of the function. To address these limitations, we utilize Bayesian Optimization (BO), a sample-efficient black-box solver, and propose a novel framework for combinatorial optimization on graphs. More specifically, we map each k-node subset in the original graph to a node in a new combinatorial graph and adopt a local modeling approach to efficiently traverse the latter graph by progressively sampling its subgraphs using a recursive algorithm. Extensive experiments under both synthetic and real-world setups demonstrate the effectiveness of the proposed BO framework on various types of graphs and optimization tasks, where its behavior is analyzed in detail with ablation studies.
Bayesian Optimization of Function Networks: Supplementary Material
In this section, we provide a formal statement and proof of Proposition 1. We begin by proving the following auxiliary result. We are now in position to show Proposition 1, which can be seen as a simple generalization of Theorem 1 in Balandat et al. (2020). R, k = 1,..., K, are Lipschitz continuous. R R, k = 1,..., K, given by f The desired result is now a direct consequence of Proposition 2 in the supplement of Balandat et al. (2020), which is in turn a consequence of Theorem 2.3 in Homem-de Mello (2008).